巴西专利BR112013018362B1 encoding and decoding event interval positions in an audio signal frame

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
Encoding and Decoding of Event Interval Positions in an Audio Signal Frame. An apparatus for decoding (10, 40, 60, 410), an apparatus for encoding (510), a method for decoding and a method for encoding interval positions comprising events in an audio signal frame and the respective computer programs and encoded signals, characterized by the apparatus for decoding (10, 40, 60, 410) comprising: an analysis unit (20, 42, 70, 420) to analyze a number of frame intervals, indicating the total number of frame intervals of the audio signal, a total number of intervals of the audio signal frame, a number of event intervals, indicating the number of intervals that comprise the events of the audio signal frame and an event status number and a generating unit (30, 45, 80, 430) to generate an indication of a plurality of interval positions comprising events in the audio signal frame, using the number of frame intervals, the number of event intervals and the number of this of the event.
公开号:BR112013018362B1
申请号:R112013018362-4
申请日:2012-01-17
公开日:2021-01-19
发明作者:Kuntz Achim；Baeckstroem Tom；Disch Sascha
申请人:Frauthihofer-Gellschaft Zur Förderijng Der Angewanoten Forschung E.V.；
IPC主号:

专利说明:

Application field
The present invention relates to the field of audio processing and audio coding, in particular, to the coding and decoding of event interval positions in an audio signal frame.
Audio processing and / or encoding has advanced in many ways. In particular, space audio applications have become increasingly important. Audio signal processing is often used to de-correlate or render signals. In addition, signal de-correlation and rendering are used in the process of converting mono to stereo upmix, mono / stereo to multichannel upmix, artificial reverb, stereo enlargement or interactive mixing / rendering of the user.
Various audio signal processing systems employ decelelators. An important example is the application of de-correlation signals in parametric spatial audio decoders to restore the specific properties of de-correlation between two or more signals that are reconstructed from one or more channel reduction signals. The application of decorrelators significantly improves the perceived quality of the output signal, for example, when compared to the intensity of the stereo. Specifically, the use of decorrelators allows the adequate synthesis of spatial sound with a wide sound image, several simultaneous sound objects and / or the environment. However, decorrelators are also known to introduce artifacts such as changes in the structure of the temporal signal, timbre, etc.
Other examples of application of decorrelators in audio processing are, for example, the generation of artificial reverberation to alter the spatial impression or the use of decorrelators in multichannel acoustic echo cancellation systems to improve the convergence behavior.
An important spatial audio coding scheme is the Parametric Stereo [PS I Parametric Stereo]. Figure 1 illustrates the structure of a mono to stereo decoder. A single de-correlator generates a de-correlated signal D (a "wet" signal) from a mono M signal input (a "dry" signal). The de-correlated signal D is then fed to a mixer together with the M signal. The mixer then applies a mixing matrix H to the input signals M and D to generate the output signals L and R. The matrix coefficients mixing H can be fixed, signal dependent or controlled by a user.
Alternatively, the mixing matrix is controlled by the side information that is transmitted together with a downmix and contains the parametric description on how to amplify the downmix signals to form the desired multichannel output. Spatial lateral information is normally generated during the mono downmix process in a concordant signal encoder.
Spatial audio coding, as described above, is widely applied, for example, in Parametric Stereo. A typical structure of a parametric stereo decoder is shown in Figure 2. In Figure 2, the correlation is performed in a transformation domain. Spatial parameters can be modified by a user or by additional tools, for example, from post-processing for binaural rendering / presentation. In this case, the upmix parameters are combined with the parameters of the binaural filters to calculate the input parameters for the upmix matrix.
The L / R output of the mixing matrix H is calculated from the mono input signal M and the de-correlated signal D.

In the mixing matrix, the amount of decorrelated sound fed to the output is controlled based on transmitted parameters, for example, Inter-Channel Level Differences [ILD I Inter-Channel Level Differences], Inter-Channel Correlation / Coherence [ICC I Inter-Channel Correlation / Coherence] and / or fixed or user-defined settings.
Conceptually, the output signal of the output of the decelector D replaces a residual signal that, under ideal conditions, would allow a perfect decoding of the original L (left) / R (right) signals. Using the output of decelector D instead of a residual signal in the upmixer results in savings in the bit transfer rate that would be required to transmit the residual signal. The purpose of the decorrelator is, therefore, to generate a signal D from the mono signal M, which has properties similar to the residual signal that is replaced by D. Reference is made to the document: [1] J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates" in Proceedings of the AES 116th Convention, Berlin, Preprint 6072, May 2004. Considering MPEG Surround (MPS ), PS-like structures called one-to-two boxes [OTT boxes | One- To-Two boxes] are used in the spatial audio decoding trees. This can be seen as a generalization of the concept of mono to stereo conversion to multichannel spatial audio encoding / decoding schemes. In MPS, there are also two-to-three upmix systems (TTT boxes) that can apply de-correlators, depending on the TTT mode of operation. Details are described in the document: [2] J. Herre, K. Kjõrling, J. Breebaart, et al., "MPEG surround" - the ISO / MPEG standard for efficient and compatible multi-channel audio coding, "in Proceedings of the 122th AES Convention, Vienna, Austria, May 2007. As for Directional Audio Coding [DirAC I Directional Audio Coding], DirAC refers to a parametric sound field coding scheme that is not linked to a fixed number of audio output channels with fixed speaker positions. DirAC applies decorrelators in the DirAC renderer, that is, in the spatial audio decoder for directional audio coding, it is described in more detail in: [3] Pulkki, Ville: "Spatial Sound Reproduction with Directional Audio Coding", in J. Audio Eng. Soc., Vol. 55, No. 6, 2007. As for the state-of-the-art de-correlators, reference is made to the documents: [4] ISO / IEC International Standard "Information Technology - MPEG audio technologies - Parti: MPEG Surround" , ISO / IEC 23003-1: 2007. [5] J. Engdegard, H. Purnhagen, J. Rõden, L. Liljeryd, "Synthetic Ambience in Parametric Stereo Coding" in Proceedings of the AES 116th Convention, Preprint, May 2004. Reticulated HR all-over structures are used as decelators in spatial audio decoders like MPS [2.4]. Other state-of-the-art decorrelators are applied (potentially frequency dependent) to delays to the decorrelated signals or promote convolution of the input signals, for example, with exponentially decomposing noise peaks. For an overview of state-of-the-art de-correlators for upmix space audio systems, reference is made to the document [5]: "Synthetic environment in parametric stereo coding". In general, applause-like stereo or multichannel signals that are encoded / decoded in spatial audio encoders are known to have reduced signal quality. Applause-like signs are transient from different directions. Examples of such signs are applause, the sound of rain, galloping horses, etc. Applause-like signals often also contain components of sound from distant sound sources that are perceptibly fused in the background sound field, similar to noise. Reticulated all-in-one structures used in spatial audio decoders like MPEG Surround work as artificial reverb generators and are, therefore, well suited for generating homogeneous, smooth, noise-like, inverting sounds (such as room reverberation tails) . However, they are examples of acoustic fields with an inhomogeneous interval-time structure that still surrounds the listener: a prominent example are applause-like fields that create an engagement of the listener not only by homogeneous noise-like fields, but also by rather dense sequences of individual applause from different directions. Thus, the non-homogeneous component of the applause sound fields can be characterized by a spatially distributed mixture of transients. These distinct palms are by no means homogeneous, smooth and similar to noise. Due to the behavior similar to reverberation, reticulated all-pass decelators are unable to generate immersive sound fields with the characteristics, for example, of applause. Instead, when applied to applause-like signals, they tend to spread the transients temporarily over the signals. The undesirable result is a surround sound field without the interval-time structure distinct from applause-like sound fields. In addition, transient events such as a single palm could awaken touch artifacts from the de-correlation filters.
The USAC [Unified speech and audio coding | Unified Voice and Audio Encoding] is an audio encoding standard for voice and audio encoding and mixing at different bit transfer rates.
The perceptual quality of USAC can be further improved in the stereo encoding of applause and bit rate applause sounds in the range of 32 kbps when stereo parametric encoding techniques are applicable. USAC-encoded applause items tend to have a narrow sound stage and a lack of involvement if no dedicated applause treatment is applied within the codec. On a large scale, USAC's stereo encoding techniques and their limitations were inherited from MPEG Surround (MPS). However, USAC offers a dedicated adaptation to the requirement of adequate treatment to applause. This adaptation is called Transient Conduction Decelerator [TSD | Transient Steering Decorrelator] and is an application of this invention.
Signs of applause can be composed of individual and distinct close palms, temporarily separated by a few milliseconds and by ambient noise from very dense and distant palms. In stereo parametric coding at a sensitive lateral information rate, the granularity of the spatial parameter sets (difference between channel levels, inter-channel correlation, etc.) is too low to ensure sufficient spatial redistribution of the individual palms, leading to a lack of involvement. In addition, the palms are subjected to treatment by a reticulated cross-hair decelator. This inevitably induces a temporal dispersion of the transients and further reduces the subjective quality.
The use of a Transient Conducting Decelator (TSD) within the USAC decoder results in a modification of MPS processing. The idea behind this approach is to solve the applause de-correlation problem as follows: - Separate the transients in the QMF domain before the reticulated all-in-one decelerator, that is: divide the decelerator's input signal into a transient flow s2 and a non-transient flow sl.
Feed the transient flow to a different controlled parameter de-correlator, which is well suited for mixtures of transients.
Feed the non-transient flow to the cross-linked MPS cross-decelerator.
Add the outputs of both de-correlators, Dj and D2 to obtain the de-correlated signal D.
Figure 3 illustrates a one-to-two [OTT] configuration, within the USAC decoder. The U-shaped transient treatment box of Figure 3 is composed of a parallel signal path, as proposed for the transient treatment.
Two parameters that govern the TSD process are transmitted as frequency independent parameters from the encoder to the decoder (see Figure 3).
A transient / non-transient binary decision of a transient detector operating in the encoder is used to control the transient separation with the granularity of the time interval in the decoder. An efficient lossless coding scheme is used to transmit the QMF interval position data from the transient.
Real parameters of the transient de-correlator, which are necessary for the transient de-correlator to guide the spatial distribution of the transients. The parameters of the transient de-correlator denote an angle between the downmix and the residual. These parameters are only transmitted for time intervals that have been detected in the encoder to contain transients.
In order to assess the quality of the technology described above, two MUSHRA hearing tests were performed in a controlled hearing test environment using high-quality STAX electrostatic headphones. The test was performed in the 32 kbps and 16 kbps stereo configuration. Sixteen expert listeners participated in each of the tests.
Since the USAC test suite does not contain applause items, additional applause items were chosen to demonstrate the benefit of the proposed technology. The items listed in Table 1 were included in the test: Table 1: Hearing test items:

Regarding the twelve regular MPEG items for USAC testing, TSD is never active. However, these items do not remain exactly identical in bits, since the TSD-enabled bits (indicating that the TSD is off) are additionally included in the data stream and therefore slightly affect the bit control for the main encoder . Since these differences are very small, these items were not included in the hearing test. Data are provided on the size of these differences to show that these changes are negligible and inconspicuous.
A codec tool called inter-TES is part of the USAC 8 (RM8) reference model. Since this technique has been reported to improve the perceptual quality of the transients, including applause-like signals, the inter-TES has always remained on in all test conditions. In such an adjustment, the best possible quality is ensured and the orthogonality of the inter-TES and TSD is demonstrated.
The system tests have the following configurations: - RM8: RM8 USAC system; CE: RM8 USAC system reinforced by
Transient Conduction Decelator (TSD).
Figures 4 and 5 show the MUSHRA scores together with their 95% confidence intervals for the 32 kbps test scenario. For the test data, a Student's t-distribution was considered. The absolute scores in Figure 4 show a higher average score for all items; for four of the five items there is a significant improvement in the 95% confidence issue. No item has been degraded against RM8. The difference scores for USAC + TSD, as assessed in a main TSD (CE) experiment with respect to RM8 USAC are represented in Figure 5. Here, a significant improvement for all items can be seen.
For the 16 kbps test setup, Figures 6 and 7 show the MUSHRA scores along with their 95% confidence intervals. A Student's t-distribution of the data was considered. The absolute scores in Figure 6 show a higher average score for each item. For an item, the meaning in the sense of 95% confidence can be seen. No item scored worse than RM8. The difference in scores is plotted in Figure 7. Again, a significant improvement for all items in relation to the different data was demonstrated.
The TSD tool is activated by a bsTsdEnable flag transmitted in the continuous data stream. If TSD is enabled, the actual separation of transients is controlled by the TsdSepData transient detection flags which are also transmitted in the continuous data stream and which are encoded in bsTsdCodedPos, in case TSD is enabled.
In the encoder, the TSD enabling flag bsTsdEnable is generated by a segmental classifier. The TsdSepData transient detection flags are defined by a transient detector.
As already indicated, TSD is not activated for the 12 MPEG USAC test items. For the 5 additional applause items, TSD activation is illustrated in Figure 8, showing a logical state bsTsdEnable versus time.
If TSD is enabled, transients are detected at certain QMF time intervals and they are subsequently fed to the dedicated transient decelators. For each additional test item, Table 2 presents percentages of intervals within TSD-enabled frames that comprise transients. Table 2: Percentage of transient gap (density of the transient gap in% of all time slots in the TSD frames)

Transient separation decisions in transmission and decorrelator parameters from the encoder to the decoder require a certain amount of lateral information. However, this amount is overcompensated by the savings in the bit rate from the transmission of broadband spatial signals within the MPS.
As a result, the average bit rates of side information MPS + TSD are even lower than the simple bit rates of side information MPS + in simple USAC, as listed in Table 3, in the first column. In the proposed configuration, as used for subjective quality assessment, the average bit rates listed in Table 3, in the second column, were measured for TSD: Table 3: Bit rates in bits / second MPS (+ TSD) within the codec scenario 32 kbps stereo:

The computational complexity of TSD results from: - the decoding of the position of the transient interval -complexity of the transient decorrelator.
Considering a spatial frame length
MPEG Surround of 32 time intervals, decoding the position of the interval requires (64 divisions + 80 multiplications) per spatial frame, in the worst case, that is, 64 * 25 + 80 = 1680 operations per spatial frame.
Ignoring the copy operations and conditional declarations, the complexity of the transient decoupler is given by a complex multiplication per interval and a hybrid QMF band.
This leads to the following general TSD complexity numbers, shown in comparison to the simple USAC complexity numbers in Table 4: Table 4:
Complexity of the TSD decoder in MOPS and related to the complexity of the basic USAC decoder:

In summary, the hearing test data clearly shows a significant improvement in the subjective quality of the applause signals in the difference in scores of all items at both points of operation. In terms of absolute scores, all items in the TSD condition have a higher average score. For 32 kbps, a significant improvement exists for four of the five items. For 16 kbps, an item shows a significant improvement. None of the items were evaluated with a result below RM8. An improvement is achieved at the cost of, as can be seen from the data on complexity, negligible computational costs. This further emphasizes the benefit of the TSD tool for USAC.
The above described Transient Conduction Decelator significantly improves audio processing in USAC. However, as has already been seen above, the Transient Conducting Decelator requires information on the existence or absence of transients within a given interval. In USAC, information about time intervals can be transmitted on a frame-by-frame basis. A board has several, for example, 32 time slots. Therefore, it is considered that an encoder also transmits information about which intervals comprise transients on a frame-by-frame basis. Reducing the number of bits to be transmitted is fundamental in the processing of audio signals. Since even a single audio recording comprises a vast number of frames, this means that even though the number of bits to be transmitted for each frame is reduced by just a few bits, the total bit transfer rate can be reduced significantly.
The problem of decoding event interval positions in an audio signal frame, however, is not limited to the problem of transient decoding. It would also be useful to decode also the interval positions of other events, for example, if an interval of an audio signal frame is tonal (or not), if it understands noise (or not) and the like. In fact, an apparatus for efficient encoding and decoding of event interval positions in an audio signal frame would be very useful for a large number of different types of events.
When this document refers to intervals or interval positions of an audio signal frame, intervals in this sense can be time intervals, frequency intervals, time-frequency intervals or any other type of interval. Furthermore, it is understood that the present invention is not limited to the processing of audio and audio signal frames in USAC, but instead relates to any type of audio signal frame and any type of audio formats , such as MPEG1 / 2, Layer 3 ("MP3"), Advanced Audio Coding (AAC I Advanced Audio Coding), and the like. The efficient encoding and decoding of event interval positions in an audio signal frame would be very useful for any type of audio signal frame.
It is, therefore, an object of the present invention to provide an apparatus for encoding event interval positions in an audio signal frame with a reduced number of bits. Furthermore, it is an object of the present invention to provide an apparatus for decoding the event interval positions in an audio signal frame, encoded by an apparatus for encoding in accordance with the present invention. The objects of the present invention are achieved by an apparatus for decoding according to claim 1, an apparatus for encoding according to claim 11, a decoding method according to claim 14, an encoding method according to claim 15, a computer program for decoding according to claim 16, a computer program for encoding according to claim 17 and a signal encoded according to claim 18.
The present invention assumes that a number of frame intervals indicating the total number of intervals of an audio signal frame and a number of event intervals indicating the number of intervals that comprise audio signal frame events can be available in one decoding apparatus of the present invention. For example, an encoder can transmit the number of frame intervals and / or the number of event intervals to the decoder. According to an application, the encoder can indicate the total number of intervals of an audio signal frame by transmitting a number which is the total number of intervals of an audio signal frame minus 1. The encoder can also indicate the number of intervals that comprise the events of the audio signal frame by transmitting a number which is the number of intervals that comprise events of the audio signal frame minus 1. Alternatively, the decoder can itself determine the total number of intervals of an audio signal frame and the number of intervals that comprise events of the audio signal frame without information from an encoder.
Based on these assumptions, according to the present invention, the number of interval positions comprising an audio signal frame can be encoded and decoded using the following results: N being the total number of intervals of an audio signal frame , and P being the number of intervals that comprise events of the audio signal frame.
It is assumed that both the encoding device and the decoding device are aware of the values of N and P. and P, it can be inferred that there are only interval positions that comprise events in an audio signal frame.
For example, if the interval positions in a table are numbered from 0 to Nl and, if P = 8, then a first possible combination of positions with the events would be (0, 1, 2, 3, 4, 5, 6 , 7), a second would be (0, 1, 2, 3, 4, 5, 6, 8), and so on, until the combination (N-8, N-7, N-6, N- 5, 4-N, N-3, N-2, Nl), so that in total there are different combinations.
In addition, the present invention uses the additional conclusion that an event state number can be encoded by a device for encoding and that the unique event state number and whether the device for decoding is aware of which status state number event represents which combination of interval positions comprising an audio signal frame (for example, by applying an appropriate decoding method), then the apparatus for decoding can decode interval positions comprising events using N, P and the event state number. For a series of typical N and P values, such an encoding technique uses fewer bits for encoding event interval positions compared to other methods (for example, employing a set of bits with one bit for each frame interval, in that each bit indicates whether an event occurred in this interval or not).
In other words, the problem of encoding the event interval positions in an audio signal frame can be solved by encoding a discrete P number of pk positions in an interval of [0 ... Nl], so that the positions are not overlapped pk / ph for k + h, with as few bits as possible. Since the order of the positions does not matter, it turns out that the number of unique combinations of positions is the coefficient. The number of bits required is therefore:

In one application, a decoding device is provided, characterized by the decoding device being adapted to conduct a comparison test between an event state number or an updated event state number at a limit value. Such a test can be used to derive the positions of intervals that comprise events from an event state number. The comparison test between an event state number and a threshold value can be performed by comparison, if the event state number or an updated event state number is greater than, equal to or greater than, less than, or less than or equal to the limit value. In addition, it is preferred that the device for decoding is adapted to update the event status number or an updated event status number, depending on the test result.
According to an application, a device for decoding is provided, which is adapted to conduct the test of comparison between an event state number or an updated event state number in relation to a particular considered interval, in which the value Limit depends on the number of frame intervals, number of event intervals and the position of the interval considered within the frame. With this, the positions of the intervals that comprise the events can be determined on an interval-by-interval basis, deciding in each frame interval, one after the other, if the interval comprises an event.
According to another application, an apparatus for decoding is provided which is adapted to divide the frame into a first frame partition comprising a first set of frame intervals and a second frame partition comprising a second set of frame intervals and wherein the decoding apparatus is further adapted to determine the positions comprising events for each of the frame partitions, separately. Thus, the positions of the intervals that comprise the events can be determined by repeatedly dividing a frame or frame partitions into even smaller frame partitions.
In the following, the applications of the present invention will be described in more detail with reference to the figures, in which:
Figure 1 is a typical application of a de-correlator in a mono upmixer for stereo;
Figure 2 is another typical application of a de-correlator in a mono upmixer for stereo;
Figure 3 is an overview of the One-To-Two [OTT] system, including a Transient Conducting Decelator (TSD);
Figure 4 is a diagram illustrating absolute scores for 32 kbps stereo comparing RM8 USAC and USAC RM8 + TSD in a main TSD experiment (CE);
Figure 5 is a diagram showing differential scores for 32 kbps stereo comparing USAC that employs a 10 Transient Conducting Decelerator against a basic USAC system;
Figure 6 is a diagram showing absolute 16 kbps stereo scores comparing RM8 USAC and USAC RM8 + TSD in a main TSD (CE) experiment;
Figure 7 is a diagram showing differential scores for 16 kbps stereo comparing USAC employing a transient conduction decelerator against a basic USAC system;
Figure 8 shows TSD activity for five additional items 20 described as bsTsdEnable flag logical state;
Figure 9a illustrates a decoding apparatus for interval positions comprising events in an audio signal frame according to an application of the present invention;
Figure 9b illustrates a decoding apparatus for the interval positions comprising events in an audio signal frame according to another application of the present invention;
Figure 9c illustrates a decoding apparatus for interval positions comprising events in an audio signal frame according to another application of the present invention;
Figure 10 is a flow chart illustrating a decoding process performed by an apparatus for decoding according to an application of the present invention;
Figure 11 illustrates a pseudocode for implementing interval position decoding comprising events according to an application of the present invention;
Figure 12 is a flow chart illustrating a coding process performed by an apparatus for coding according to an application of the present invention;
Figure 13 is a pseudocode describing a process of encoding the positions of the intervals comprising events in an audio signal frame according to another application of the invention;
Figure 14 illustrates a decoding apparatus for interval positions comprising events in an audio signal frame according to another application of the present invention;
Figure 15 illustrates a coding apparatus 25 for interval positions comprising events in an audio signal frame according to an application of the present invention;
Figure 16 represents the USAC MPS 212 data syntax according to an application;
Figure 17 illustrates the USAC TsdData syntax according to an application;
Figure 18 illustrates an nBitsTrSlots table depending on the length of the MPS frame;
Figure 19 shows a table related to the USAC bsTempShapeConfig according to an application;
Figure 20 represents the USAC TempShapeData syntax according to an application;
Figure 21 illustrates a de-correlating D block in an OTT decoding block according to an application;
Figure 22 illustrates the USAC EcData syntax according to an application;
Figure 23 illustrates a signal flowchart for generating TSD data;
Figure 9a illustrates an apparatus 10 for decoding the interval positions comprising events in an audio signal frame according to an application of the present invention. The decoding apparatus 10 includes an analysis unit 20 and a generating unit 30. A number of FSN frame intervals, indicating the total number of intervals of an audio signal frame, a number of ESON event intervals indicating the number of intervals that make up events from the audio signal frame, and an ESTN event state number are fed into the decoding device 10.0 decoding device 10 then decodes the interval positions that comprise events using the number of ESN frame intervals, the number of ESON event intervals and the ESTN event state number. Decoding is performed by the analysis unit 20 and the generating unit 30, which cooperate in the decoding process. While the analysis unit 20 is responsible for executing tests, for example, comparing the ESTN event state number with a limit value, the generating unit 30 generates and updates the intermediate results of the decoding process, for example, an updated number event status.
In addition, the generating unit 30 generates an indication of a plurality of interval positions comprising events in the audio signal frame. The particular indication of a plurality of interval positions comprising events from the audio signal frame can be referred to as an "indicative state".
According to an application, the indication of a plurality of interval positions comprising events in the frame of the audio signal can be generated in such a way that at a first point in time, the generating unit 30 indicates a first interval, if the interval it comprises an event or not, at a second point in time, the generating unit 30 indicates a second interval, whether the interval comprises an event or not, and so on.
According to another application, the indication of a plurality of interval positions that comprise the events can be, for example, a set of bits, indicating, for each interval of the frame, an event is comprised.
The analysis unit 20 and the generating unit 30 can cooperate in such a way that both units connect one or more times in the decoding process to produce intermediate results.
Figure 9b illustrates a decoding apparatus 40 according to an application of the present invention. The decoding apparatus 40 differs, inter alia, from the apparatus 10 of Figure 9a in that it further comprises an audio signal processor 50. The audio signal processor 50 receives an audio input signal and the indication of a plurality of interval positions comprising events in the audio signal frame that was generated by a generating unit 45. Depending on the indication, the audio signal processor 50 generates an audio output signal. The audio signal processor 50 can generate the audio output signal, for example, by de-correlating the audio input signal. In addition, the audio signal processor 50 may comprise a cross-linked IIR decelator 54, a transient decelerator 56 and a transient separator 52 for generating the audio output signal, as shown in Figure 3. If the indication of a plurality of slot positions comprising the events in the audio signal frame indicates that a slot comprises a transient, then the audio signal processor 50 will de-correlate the audio input signal relative to that slot by transient de-correlator 56. If, however, the indication of a plurality of interval positions comprising the events in the audio signal frame indicates that an interval does not comprise a de-correlation of the incoming audio signal S relative to that interval, using the de-correlator Crosslinked IIR 54. The audio signal processor employs the transient separator 52 which decides based on whether a portion of the audio input signal for a gap is fed into the transient de-correlator 56 or the cross-linked UR de-correlator 54, depending if the signal indicates that a particular interval comprises a transient (de-correlation by the de-correlation transient or 56) or if the interval does not comprise a transient (de-correlation by the cross-linked UR decelrelator 54).
Figure 9c illustrates an apparatus for decoding 60 according to an application of the present invention. The decoding apparatus 60 differs from the apparatus 10 of Figure 9a in that it further comprises an interval selector 90. Decoding is done on an interval-by-interval basis, deciding for each frame interval, one after the other , if the range comprises an event. The range selector 90 decides which range of a frame to consider. The preferred approach would be for the range selector 90 to choose the intervals in a frame, one after the other.
The interval-by-interval decoding of the decoding apparatus 60 of the present application is based on the following conclusions, which can be employed for the applications of a decoding apparatus, an encoding apparatus, a decoding method and a method for encoding the interval positions that comprise events in an audio signal frame. The following conclusions are also applicable for the respective computer programs and coded signals:
Whereas N is the (total) number of intervals of an audio signal frame and P is the number of intervals that comprise frame events (this means that N can be the number of FSN frame intervals and P can be the number of ESON event intervals). The first interval in a frame is considered. Two cases can be distinguished: If the first interval is an interval other than different combinations of the interval positions P comprising an event with respect to the remaining N-1 intervals in the table.
However, if the first interval is an interval that comprises an event, then, with respect to the remaining Nl intervals in the table, there are only = possible different combinations of the remaining Pl 1 P) intervals comprising an event with respect to the remaining Nl intervals from the board.
Based on this conclusion, the applications are also based on the conclusion that all combinations with a first interval where an event has not occurred, must be coded by event status numbers that are less than or equal to a limit value. In addition, all combinations with a first interval where an event has occurred, must be encoded by event status numbers that are greater than a threshold value. In an application, all event status numbers can be positive integers or 0, and an appropriate f / V-1A threshold in relation to the first interval can be
In one application, an apparatus for decoding is adapted to determine whether the first interval of a frame comprises an event through testing and whether the number of event status is greater than a threshold value. (Alternatively, the application encoding / decoding process can also be performed, in such a way that a decoding device tests whether the event status number is greater than or equal to, less than or equal to or less than a threshold value). After analyzing the first interval, decoding proceeds to the second interval of the frame using adjusted values: in addition to adjusting the number of intervals considered (which is reduced by one), the number of intervals that comprise events is also eventually reduced by one ( if the first interval comprised an event) and the event status number is adjusted, if the event status number was higher than the limit value, to exclude the portion relating to the first interval from the event status number . The decoding process can be continued for more frame intervals in a similar way.
In an application, a discrete P number of PK positions in the range of [0 ... Nl] is coded in such a way that the positions are not overlapped pk / ph for k + h. Here, each unique combination of positions in the given range is called a state and each possible position in this range is called a range. According to an application for a decoding device, the first interval in the range is considered. If the range does not have a position assigned to it, then the range can be reduced to Nl, and the number of possible states is reduced to P-1 | • On the other hand, if the state is greater than bentão, one can P) conclude that the first interval has a position that is assigned to it. The following decoding algorithm can result from this: For each interval h If the state
, then
Assign position to interval h
Update remaining status: = status

Reduce the number of positions left P: = P-l Final Final
Calculating the binomial coefficient in each iteration would be costly. Therefore, according to applications, the following rules can be used to update the coefficient using the value from the previous iteration:

Using these formulas, each update of the binomial coefficient costs only one multiplication and one division, while an explicit evaluation would cost P multiplications and divisions in each iteration.
In this application, the total complexity of the decoder is P multiplications and divisions for initialization of the binomial coefficient, for each iteration 1 multiplication, division and if statement, and for each encoded position 1 multiplication, addition and division. Note that, in theory, it would be possible to reduce the number of divisions required for initialization to one. In practice, however, this approach would result in large integers, which are difficult to handle. The worst case of decoder complexity is, then, N + 2P divisions and N + 2P multiplications, P additions (can be ignored if MAC operations are used) and N if statements.
In an application, the coding algorithm used by a device for coding does not have to iterate through all the intervals, but only those that have a position assigned to it. Therefore,

The worst case of encoder complexity is that of P- (P-1) multiplications and P- (P-1) divisions, as well as P-1 additions.
Figure 10 illustrates a decoding process carried out by an apparatus for decoding according to an application of the present invention. In this application, decoding is performed on an interval-by-interval basis.
In step 110, the values are initialized. The device for decoding stores the event status number, which is received as an input value, the variable s. In addition, the number of intervals that comprise events, as indicated by a number of event intervals is stored in the variable p. In addition, the total number of intervals contained in the frame, as indicated by a number of frame intervals, is stored in variable N.
In step 120, the TsdSepData [t] value is initialized to 0 for all intervals in the frame. The TsdSepData bit set is the data output to be generated. This indicates for each interval position t, whether the interval with the corresponding interval position comprises an event (TsdSepData [t] = 1) or if not (TsdSepData [t] = 0). In step 120, the corresponding values for all frame intervals are initialized to 0.
In step 130 the variable k is initialized with the value of N-l. In this application, the intervals of a frame comprising N elements are numbered 0, 1, 2,. .., N-l. Setting k = N-l means that the range with the largest number of intervals is considered first.
In step 140, it is examined whether k 2: 0. If k <0, the decoding of the gap positions has been completed and the process ends, otherwise the process continues with step 150.
In step 150, it is tested whether p> k. If p is greater than k, this means that all the remaining intervals comprise an event. The process continues at step 230, where all TsdSepData field values from the remaining intervals 0, 1, ..., k are set to 1, indicating that each of the remaining intervals comprises an event. In this case, the process ends later. However, if step 150 verifies that p is not greater than k, the decoding process continues at step 160.
In step 160, the calculated θ value, c is used as the limit value.
In step 170, it is tested whether the number of event status s (possibly updated) is greater than or equal to c, where c is the threshold value calculated in step 160.
If s is less than c, this means that the interval considered (with the position of interval k) does not include an event. In this case, no further action should be taken, as TsdSepData [k] was already set to 0 for this interval in step 140. The process then continues with step 220. In step 220, k is defined as k: = k-1 and the next interval is considered.
However, if the test in step 170 shows that s is greater than c, this means that the interval k considered comprises an event. In this case, the event state number s is updated and is set to the value s: = s-c in step 180. In addition, TsdSepData [k] is set to 1 in step 190 to indicate that the interval k comprises an event. On the other hand, in step 200, p is defined as p-1, indicating that the remaining intervals to be examined now only comprise intervals p-1 with events.
In step 210, it is tested whether p is equal to 0. If p is equal to 0, the remaining intervals do not comprise the events and the decoding process is ended. Otherwise, at least one of the remaining intervals comprises an event and the process continues at step 220, where the decoding process continues with the next interval (k-1).
The application decoding process illustrated in Figure 10 generates the TsdSepData set as the output value that indicates for each k interval of the frame, if the interval comprises an event (TsdSepData [k] = 1), or if not (TsdSepData [k ] = 0).
Returning to Figure 9c, an application decoding apparatus 60, in which the apparatus implements the decoding process illustrated in Figure 10 comprises an interval selector 90, which decides which intervals to consider. With reference to Figure 10, this range selector would be adapted to carry out process steps 130 and 220 in Figure 10. A suitable analysis unit 70 of this application could be adapted to carry out processing steps 140, 150, 170, and 210 of Figure 10. The generating unit 80 of such an application would be adapted to conduct all other processing steps of Figure 10.
Figure 11 illustrates a pseudocode implementing the decoding of the positions of the intervals, comprising events according to an application of the present invention.
Figure 12 illustrates a coding process carried out by an apparatus for coding in accordance with an application of the present invention. In this application, encoding is performed on an interval-by-interval basis. The purpose of the coding process according to the application illustrated in Figure 12 is to generate an event status number.
In step 310, the values are initialized. p_s is initialized to 0. The event status number is generated by the successive update of the variable p_s. When the encoding process is completed, p_s will load the event status number. Step 310 also initializes the variable k, when defining k to k: = number of intervals that comprise events in a frame - 1.
In step 320, variable "intervals" are defined for intervals: = tsdPos [k], where tsdPos is a set occupying the positions of intervals comprising events. The interval positions in the set are stored in ascending order.
In step 330, a test is performed, testing whether k> intervals. If this is the case, the process ends. Otherwise, the process continues at step 340.
In step 340, the value of c = p / oAj [intervalol lj + 1) k + 1] is calculated. In step 350, the variable p_s is updated and set to p_s: = p_s + c.
In step 360, k is defined as k: - k-1.
Then, in step 370, a test is conducted, testing if k> 0. In this case, the next k-1 interval is considered. Otherwise, the process ends.
Figure 13 illustrates a pseudocode, implementing the encoding of the positions of the intervals that comprise the events according to an application of the present invention.
Figure 14 illustrates an apparatus for decoding 410 and interval positions comprising events in an audio signal frame according to another application of the present invention. Again, as in Figure 9a, a number of FSN frame intervals, indicating the total number of intervals of an audio signal frame, a number of ESON event intervals indicating the number of intervals that comprise audio signal frame events. , and an ESTN event state number are fed into the decoding apparatus 410. The decoding apparatus 410 differs from the apparatus in Figure 9a in that it also comprises a frame partitioner 440. The frame partitioner 440 is adapted to divide the frame in a first frame partition comprising a first set of frame intervals and in a second frame partition comprising a second set of frame intervals, and wherein the positions of intervals comprising events are determined separately for each of the frame partitions. Therefore, the positions of the intervals that comprise events can be determined by repeatedly dividing a frame or frame partitions into even smaller frame partitions.
The "partition-based" decoding of the device to decode 410 of this application is based on the following concepts, which can be adopted for applications of a device for decoding, a device for encoding, a method of decoding and a method for encoding intervals that comprise events in an audio signal frame. The following concepts are also applicable to the respective computer programs and coded signals:
Partition-based decoding is based on the idea that a frame is divided into two frame partitions A and B, each frame partition consisting of a set of intervals, where frame partition A is composed of Na intervals and where frame partition B comprises Nb intervals in such a way that Na + Nb = N. The frame can be arbitrarily divided into two partitions, preferably in such a way that partition A and B have almost the same total number of intervals (for example, such that Na = Nbou Na = Nb-1). When dividing the frame into two partitions, the task of determining the interval positions where the events occurred is also divided into two sub-tasks, that is, determining the interval positions where the events occurred in the partition of frame A and determining the positions interval where the events occurred in the partition of frame B.
In this application, it is again considered that the device for decoding has knowledge of the number of frame intervals, the number of intervals comprising frame events and an event status number. To resolve the two subtasks, the decoding device must also be aware of the number of intervals for each frame partition, the number of intervals where events have occurred in relation to each frame partition, and the event state number for each partition of the frame. frame (such an event state number a frame partition is now referred to as the "event substate number").
As the decoding apparatus itself divides the frame into two frame partitions, it is known in itself that frame partition A comprises Na intervals and partition B comprises Nb intervals. The determination of the number of intervals comprising events for each of the two frame partitions is based on the following findings:
As the frame is divided into two partitions, each of the intervals that comprise events is then located on partition A or on partition B. In addition, considering that P is the number of intervals that comprise events of a frame partition, and N is the total number of intervals of the frame partition and that f (P, N) is a function that returns the number of different combinations of event interval positions of a frame partition, then the number of different combinations of interval positions of events from the entire frame (which was divided into partition A and partition B) is:
an application all combinations with the first configuration, where partition A has 0 intervals comprising events and where partition B has P intervals comprising events, must be encoded with an event state number less than a first limit value. The state number of the event can be coded as an integer value being positive or 0. Since there are only f (0, Na) • f (P, Nb) combinations with the first configuration, a suitable first limit value can be f ( 0, Na) • f (P, Nb).
All combinations with the second configuration, where partition A has 1 interval comprising events and where partition B has P-1 intervals comprising events, must be encoded with an event state number greater than or equal to the first limit value, but less than or equal to a second value. Since there are only f (1, Na) • f (P-1, Nb) combinations with the second configuration, a second suitable value can be f (0, Na) • f (P, Nb) + f (l, Na) 'f (Pl, Nb). The event state number for combinations with other settings is determined in a similar way.
According to an application, decoding is performed by separating a frame into two frame partitions A and B. Then, it is tested whether the event state number is less than a first limit value. In a preferred application, the first limit value can be f (0, Na) ■ f (P, Nb).
If the event status number is less than the first limit value, then it can be concluded that partition A comprises 0 intervals comprising events and partition B comprises all intervals P of the frame where the events occurred. Decoding is then conducted for both partitions with the number respectively determined representing the number of intervals that comprise the events of the corresponding partition. In addition, a first event state number is assigned to partition A and a second event state number is assigned to partition B, which are respectively used as a new event state number. In this document, an event state number for a frame partition is referred to as an "event substation number".
However, if the event state number is greater than or equal to the first threshold value, the event state number can be updated. In a preferred application, the event status number 5 can be updated by subtracting one. value from the event status number, preferably by subtracting the first limit value, for example, f (0, Na) ■ f (P, Nb). In a next step, it is tested whether the updated event state number is less than a second limit value. In a preferred application 10, the second limit value can be f (1, Na) -f (P-1, Nb). If the state number of the event is less than the second limit value, it can be derived that partition A has an interval comprising events and partition B has P-1 intervals comprising events. Decoding is then conducted for both partitions with the 15 respectively determined numbers of intervals that comprise events for each partition. A first event substate value is used for the decoding of partition A and a second event substate value is used for the decoding of partition B. However, if the event state number is greater than or equal to 20 equal to the second value limit, the event state number can be updated. In a preferred application, the event status number can be updated by subtracting a value from the event status number, preferably f (1, Na) • f (Pl, Nb) • The decoding process is applied in accordance with similar way for the remaining 25 possibilities of distribution of the intervals that comprise events related to the two frame partitions.
In an application, an event substation value for partition A and an event substate value for partition B can be used for the decoding of partition A and partition B, where both event substation values are determined using of the realization of the division: event state value / f (number of intervals comprising partition events B, Nb)
Preferably, the event substation number of partition A is the entire part of the above division and the event substation number of partition B is the reminder of this division. The event state number used in this division can be the original event state number of the frame or an updated event state number, for example, updated by subtracting one or more limit values, as described above.
To illustrate the concept described above of partition-based decoding, a situation is considered where a frame has two intervals that comprise events. Furthermore, if f (p, N) is again the function that returns the number of different combinations of event interval positions for a frame partition, where p is the number of intervals that comprise events of a frame partition and N is the total number of intervals for that frame partition. Then, for each of the possible distributions of the positions, the following number of possible combinations are the results:

It can therefore be concluded that if the state number of the coded event of the frame is less than f (0, Na) • f (2, Nb), then the intervals that comprise the events must be distributed as 0 and 2. In contrast, f ( 0, Na) -f (2, Nb) is subtracted from the state number of the event, and the result is compared with of (l, Na) • f (l, Nb). Otherwise, only the distribution 2 and 0 remains on the left, and the positions are distributed as 2 and 0.
Next, a pseudocode is provided according to an application for the decoding positions of intervals that comprise certain events (here: "pulses") of an audio signal frame. In this pseudocode "pulses_a [ulssos_a]" is the (assumed) number of intervals that comprise events in partition A and "pulses_b [ulssosb]" is the (assumed) number of intervals that comprise events in partition B. In this pseudocode, the number of event status (possibly updated) is referred to as "state". The substation numbers of the event of partitions A and B are still encoded together in the variable "state". According to an application's set coding scheme, the number of substates of the event of A (here referred to as "state_a [state_a]") is the entire part of the division: state / f (pulses_b, Nb), and the B event substation number (here referred to as "state b [state_b]") is the reminder of this division. Therefore, the length (total number of partition intervals) and the number of encoded positions (number of intervals comprising events in the partition) of both partitions can be decoded using the same technique: Function x = decodestate (state, pulses, N) 1.Divide vector into two partitions of length Na and Nb. 2.For pulses_a from 0 to pulses a.pulses_b = pulses - pulses_a b.se state <f (pulses_a, Na) * f (pulses_b, Nb), then pause for loop. c.state: = state - f (pulses_a, Na) * f (pulses_b, Nb) 3.The number of possible states for partition B is no_states_b = f (pulses_b, Nb) 4.The states, state_a and stateb, of partitions A and B, respectively, are the entire part and the reminder of the state / no_statesb division. 5.If Na> 1, then the decoded vector of partition A is obtained recursively by xa = decodestate (state_a, pulses_a, Na) Otherwise (Na == 1) and the vector of xa is a scalar and we can define xa = state_a. 6. If Nb> 1, then the decoded vector of partition B is obtained recursively by xb = decodestate (state_b, pulses_b, Nb) Otherwise (Nb == 1), and the vector xb is a scalar and we can define xb 7 The final output x is obtained by merging xa and xb by x = [xa xb].
The output of this algorithm is a vector that has one (1) in each encoded position (that is, an interval position of an interval that comprises an event) and zero (0) in the other locations (that is, in the positions of intervals that do not understand events).
Next, a pseudocode is provided according to an application for interval coding positions comprising events in an audio signal frame, which uses the similar names of the variables with a similar meaning, as above:
State function = encodestate [state coding] (x, N) 1.Divide vector into two partitions xa and xb of length Na and Nb. 2. Count pulses in partitions A and B in pulses_a and pulses_b, and set pulses = pulses_a + pulses_b. 3.Set state to 0 4.For k from 0 to pulses_a-l a.state: = state + f (k, Na) * f (k-pulses, Nb) 5.If Na> 1, code partition A by state_a - encodestate (xa, Na); Otherwise (Na == 1), adjust state_a = xa. 6.If Nb> 1, encode partition B by state_b = encodestate (xb, Nb); Otherwise (Nb == 1), adjust state_b = xb. 7.Code states together state: = state + state_a * f (pulses_b, Nb) + state_b.
Here, it is assumed that, similar to the decoding algorithm, each encoded position (that is, an interval position of an interval that comprises an event) is identified by one (1) in the vector x and all other elements are zero (0) (that is, at interval positions that do not comprise events).
The recursive methods formulated above in pseudocode can be easily implemented in a non-recursive way using standard methods.
According to an application of the present invention, the function f (p, N) can be performed as a reference table. When the positions do not overlap, as for example in the current context, then the function of number of states f (p, N) is simply the binomial function that can be calculated online. There is a

According to an application of the present invention, both the encoder and the decoder have a loop where the product f (pk, Na) * f (k, Nb) is calculated for consecutive values of k. For an efficient calculation, this can be written as

words, the successive terms for the subtraction / addition (in step 2b and 2c, in the decoder, and in step 4a in the encoder) can be calculated by three multiplications and one division by iteration.
Similar to the method described above, the state of a long vector (a frame with several intervals) can be a very large integer, easily extending the length of the representation in standard processors. Therefore, it will be necessary to use arithmetic functions capable of handling very long integers.
As for complexity, the method considered here has a difference in relation to the above interval-to-interval processes, a division and conquest algorithm.
Considering that the length of the input vector is a power of two, then the recursion has a depth of log2 (N).
Since the number of pulses remains constant at each depth of the recursion, then the number of iterations of the cycle is the same for each recursion. This results in the number of cycles being pulses • log2 (N).
As explained above, each update of f (p- k, Na) ■ f (k, Nb) can be done with three multiplications and one division.
It should be noted that subtractions and comparisons in the decoder can be considered as a single operation.
It can easily be seen that the partitions are merged log2 (N) -l times. In joint coding of encoder states, it is therefore necessary to multiply and add log2 (N) -1 times. Similarly, in the joint decoding of states in the decoder, it is necessary to divide log2 (N) -1 times.
It should be noted that of the divisions, only the joint encoding of states in the decoder needs divisions in which the denominator is a long integer. The other divisions always have relatively short integers in the denominator. Since divisions with long denominators are the most complex operations, they should be avoided whenever possible.
In summary, the number of arithmetic operations with long integers in the decoder Multiplications (3 * pulses + l) -log2 (N) - 1 Divisions (pulses +1) • log2 (N) -1 Divisions of long denominator log2 (N) -1 Pulses additions and subtractions ■ log2 (N) Similarly, in the encoder there are Multiplications (3- pulses + 1) ■ log2 (N) -l Divisions (pulses +1) • log2 (N) -1 Divisions of long denominator 0 Additions and subtractions (pulses +2) • log2 (N) divisions with a long denominator are required.
In other applications, the applications described above that comprise or are adapted to employ recursive processing steps are modified in such a way that some or all of the recursive processing steps are implemented in a non-recursive manner using conventional methods
Figure 15 illustrates an apparatus for encoding (510) positions of the intervals comprising events in an audio signal frame according to an application. The coding apparatus (510) comprises an event state number generator (530), which is adapted to encode the positions of the intervals by encoding an event state number. In addition, the apparatus comprises an interval information unit (520) adapted to provide a number of frame intervals and a number of event intervals to the event status number generator (530). The event state number generator can implement one of the methods described above for encoding.
In another application, an encoded audio signal is provided. The encoded audio signal comprises an event status number. In another application, the encoded audio signal, further comprises a number of event intervals. In addition, the encoded audio signal frame may also comprise a number of frame intervals. In the audio signal frame, the positions of the intervals comprising events in an audio signal frame can be decoded according to one of the methods described above for decoding. In an application, the event status number, the number of event intervals and the number of frame intervals are transmitted in such a way that the positions of the intervals comprising events in an audio signal frame can be decoded using one of the methods described above. .
The encoded audio signal of the invention can be stored in a digital storage medium or in a non-transitory storage medium or it can be transmitted through a transmission medium, such as a wireless transmission medium or a cable transmission medium , such as the Internet.
The following explains the USAC syntax definitions adapted to support a Transient Conducting Decelerator (TSD), according to an application:
Figure 16 illustrates data from MPS 212 (MPEG Surround). MPS 212 data is a data block that contains the charge for the MPS 212 stereo module. MPS 212 data contains TSD data.
Figure 17 represents the syntax of the TSD data. Comprises the number of transient intervals (bsTsdNumTrSlots) and TSD Transient Phase Data (bsTsdTrPhaseData) for the intervals in an MPS 212 data frame. If an interval includes transient data (TsdSepData [ts] is set to 1) bsTsdTrPhaseData phase, otherwise bsTsdTrPhaseData [ts] is set to 0. nBitsTrSlots defines the number of bits used to carry the number of transient intervals (bsTsdNumTrSlots). nBitsTrSlots depends on the number of intervals in an MPS 212 data frame (numSlots). Figure 18 illustrates the relationship between the number of intervals in an MPS 212 data frame and the number of bits used to carry the number of transient intervals.
Figure 19 defines the meaning of tempShapeConfig. tempShapeConfig indicates the time configuration operation mode (STP or GES) or the activation of transient direction de-correlation in the decoder. If tempShapeConfig is set to 0, the time setting is not applied at all, if tempShapeConfig is set to 1, Subband Domain Time Processing [STP | Subband Domain Temporal Processing] is applied, if tempShapeConfig is set to 2, Guided Envelope Configuration [GES I Guided Envelope Shaping] is applied, and if tempShapeConfig is set to 3, then Transient Direction Deviation [TSD] is applied.
Figure 20 illustrates the TempShapeData syntax. If bsTempShapeConfig is set to 3, TempShapeData comprises bsTsdEnable indicating that TSD is enabled on a frame.
Figure 21 illustrates a de-correlating block D according to an application. The de-correlating block D in the OTT decoding block comprises a signal separator, two de-correlating structures, and a signal combiner. DAP means: pass-everything decorrelator, as defined in sub-item 7.11.2.5 (Pass-everything decorrelator). DTR stands for: Transient Decelator.
If the TSD tool is active in the current frame, that is, if (bsTsdEnable == 1), the input signal is separated into a transient flow v '[: kTre into a non-transient flow v "YkiliinTr according to:

The TsdSepData (n) interval transient separation flag is decoded from the variable-length code word bsTsdCodedPos by TsdTrPcsdec (), as described below. The length of the code word for bsTsdCodedPos, that is nBitsTsdCW, is calculated according to:

Returning to Figure 11, Figure 11 illustrates the decoding of TSD bsTsdCodedPos transient gap separation data in TsdSepData [n], according to an application. A set of numSlots consisting of 'l's for encoded transient positions and' 0's for the remainder, is defined, as illustrated in Figure 11.
If the TSD tool is disabled in the current frame, that is, if (bsTsdEnable == 0), the input signal is processed as if TsdSepData (n) = 0 for all n.
Transient signal components are processed in a D TR transient de-correlating structure as follows
Where

The non-transient signal components are processed in a de-correlator that passes all DAP / as defined in the following subsection, generating the output of the de-correlator for non-transient signal components,

The de-correlator outputs are added to form the de-correlated signal containing both transient and non-transient components

Figure 22 illustrates the EcData syntax comprising bsFrequencyResStrideXXX. The syntax element bsFreqResStride allows the use of broadband signals in MPS. XXX is to be replaced by the data type value (CLD, ICC, IPD).
The Transient Conductive Decelator in the OTT decoder structure offers the possibility of applying a specialized decelrelator for applause-like signal transient components. The activation of this TSD feature is controlled by the bsTsdEnable flag generated by the encoder, which is transmitted once per frame.
The TSD data on the two channels for a channel module (R-OTT) of the encoder is generated as follows: - Execute a semantic signal classifier that detects signals similar to applause. The classification result is transmitted once per frame: The bsTsdEnable flag is set to 1 for applause-like signals, otherwise it is set to 0.-if bsTsdEnable is set to 0 for the current frame, no TSD data is generated / transmitted for this frame. -if bsTsdEnable is set to 1 for the current frame, perform the following: Turn on the broadband calculation of the OTT spatial parameters.
Detect transients in the current frame (binary decision by MPS time interval).
Encode the tsdPosLen transient gap positions in a tsdPos vector according to the following pseudocode, where the tsdPos gap positions are expected in ascending order. Figure 13 illustrates a pseudocode for encoding transient gap positions in tsdPosLen.
Transmit the number of transient intervals (bsTsdNumTrSlots = (number of detected transient intervals) -1).
Transmit the positions of encoded transients (bsTsdCodedPos).
For each transient interval, calculate a phase measurement that represents the broadband phase difference between the downmix signal and the residual signal.
For each transient interval, encode and transmit the broadband phase difference measurement (bsTsdTrPhaseData).
Finally, Figure 23 illustrates a signal flow diagram for generating TSD data on the two channels for a channel module (R-OTT).
Although some aspects have been described in the context of an apparatus, it is evident that these aspects also represent a description of the corresponding method, where a block or a device corresponds to a method step or a characteristic of a method step. Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding device.
Depending on the requirements of certain implementations, the applications of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example, a Floppy Disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, having electronically readable control signals stored there, which cooperate (or are able to cooperate) with a programmable computer system so that the respective method is carried out.
Some applications according to the invention comprise a data carrier with electronically readable control signals, which are able to cooperate with a programmable computer system, in such a way that one of the methods described here is carried out.
In general, the applications of the present invention can be implemented as a computer program product with a program code, the program code being operative to perform one of the methods in which the computer program product operates on a computer . The program code can, for example, be stored on a device with read support.
Other applications include the computer program for executing one of the methods described here, stored on a device with reading support or on a non-transitory storage medium.
In other words, an application of the method of the invention is, therefore, a computer program with a program code to perform one of the methods described herein, when the computer program is executed on a computer.
Another application of the method of the invention is, therefore, a data medium (or a digital storage medium or a medium with a reading medium) comprising, recorded on it, the computer program for carrying out one of the methods described herein.
Another application of the method of the invention is, therefore, a stream of data or a sequence of signals representing the computer program for carrying out one of the methods described herein. The data stream or the signal sequence can, for example, be configured to be transferred over a connection for data communication, for example, over the Internet.
An additional application comprises a processing means, for example, a computer or a programmable logic device, configured for or adapted to perform one of the methods described herein.
Another application comprises a computer having a computer program installed on it to perform one of the methods described herein.
In some applications, a programmable logic device (for example, a set of field programmable gates) can be used to perform some or all of the functionality of the methods described here. In some applications, a set of programmable ports in the field can cooperate with a microprocessor in order to execute one of the methods described here. In general, the methods are preferably carried out by any hardware device.
The applications described above are merely illustrative of the principles of the present invention. It is understood that modifications and variations of the arrangements and details described herein will be evident to other experts in the art. It is therefore intended to be limited only by the scope of the patent pending claims and not by the specific details presented by way of description and explanation of the characterizations of the present invention. Literature: [1] J. Breebaart, S. van de Par, A. Kohlrausch, E. Schuijers, "High-Quality Parametric Spatial Audio Coding at Low Bitrates" in Proceedings of the AES 116th Convention, Berlin, Preprint 6072, May 2004 [2] J. Herre, K. Kjõrling, J. Breebaart et al., "MPEG surround - the ISO / MPEG standard for efficient and compatible multi-channel audio coding," in Proceedings of the 122th AES Convention, Vienna, Austria, May 2007 [3] Pulkki, Ville; "Spatial Sound Reproduction 5 with Directional Audio Coding" in J.Audio Eng. Soc., Vol. 55, No. 6, 2007 [4] ISO / IEC International Standard "Information Technology - MPEG audio technologies - Parti: MPEG Surround", ISO / IEC 23003-1: 2007. 10 [5] J. Engdegard, H. Purnhagen, J. Roden, L. Liljeryd, "Synthetic Ambience in Parametric Stereo Coding" in Proceedings of the AES 116th Convention, Berlin, Preprint, May 2004.

权利要求:
Claims (15)
[0001]
1. An apparatus for decoding (10, 40, 60, 410) an encoded audio signal having an audio signal frame comprising intervals and events associated with the intervals, comprising: an analysis unit (20, 42, 70, 420) for analyzing a number of frame intervals, indicating the total number of intervals of the audio signal frame, a number of event intervals indicating the number of intervals comprising the events of the audio signal frame, and a number event status; a generating unit (30, 45, 80, 430) for generating an indication of a plurality of interval positions comprising events in the audio signal frame using the number of frame intervals, the number of event intervals and the number of event status; an audio signal processor (50) adapted to generate an audio output signal using the indication of a plurality of interval positions comprising events in the audio signal frame using the number of frame intervals, the number of intervals of the event and the state number of the event.
[0002]
A decoding device (10,40,60, 410) according to claim 1, characterized in that the decoding device (10, 40, 60, 410) is adapted to decode the positions of the transient intervals in a signal frame of audio.
[0003]
3.A device for decoding (10, 40, 60, 410) according to claim 1 or 2, characterized in that the analysis unit (20, 42, 70, 420) is adapted to perform a test comparing the status number of the event or an event state number updated with a threshold value.
[0004]
4. A device for decoding (10, 40, 60, 410) according to claim 3, characterized in that the analysis unit (20, 42, 70, 420) is adapted to perform the test through comparison, if the number of event state or an updated event state number is greater than, equal to or greater than, less than, or less than or equal to the limit value, and where the generating unit (30, 45, 80, 430) is also adapted to update the event state number or an updated event state number depending on the test result.
[0005]
A decoding apparatus (10, 40, 60) according to claim 3 or 4, characterized in that the decoding apparatus (10, 40, 60) also comprises an interval selector (90), wherein the interval selector (90) is adapted to select an interval, such as a considered interval, in which the unit of analysis (20, 42, 70) is adapted to perform the test with respect to a considered interval, and in which the limit value depends on the number of frame intervals, the number of intervals of the event and the position of the interval considered within the frame.
[0006]
A decoding device (10, 40) according to claim 5, characterized in that the analysis unit (20, 42, 70) is adapted to conduct the test by comparing the event status number or an updated status number. limit value event, where the limit value is |
[0007]
A device for decoding (10, 40, 410) according to one of claims 1 to 4, characterized in that the device for decoding (10, 40, 410) also comprises a frame partitioner (440), in which the frame (440) is adapted to divide the frame into a first frame partition which comprises a first set of frame intervals and into a second frame partition which comprises a second set of frame intervals, and in which the decoding apparatus (10, 40, 410) is also adapted to determine the interval positions comprising the events for each of the frame partitions separately.
[0008]
An apparatus for decoding (10, 60, 410) according to claim 7, characterized in that the audio signal processor (50) is adapted to generate the audio output signal according to a first method, if the indication of a plurality of interval positions comprising the events is in a first indication state, and in which the audio signal processor (50) is adapted to generate the audio output signal according to a second different method, if the indication of a plurality of interval positions comprising the events is in a second indication state which is different from the first state indication.
[0009]
A device for decoding (10, 40, 60, 410) according to claim 8, characterized in that the audio signal processor (50) is adapted, in such a way that the first method comprises the use of a transient decoupler (56) to decode an interval, if the first indication state indicates that the interval comprises a transient and in which the second method comprises the use of a second de-correlator (54) to decode an interval, if the second indication state indicates that the gap does not comprise a transient.
[0010]
10. An apparatus for encoding (510) interval positions that comprise events in an audio signal frame, comprising: an event status number generator (530) for encoding the interval positions by encoding a state status number. event, and an interval information unit (520), being adapted to provide a number of frame intervals indicating the total number of audio signal frame intervals and a number of event intervals indicating the number of intervals comprising the events from the audio signal frame to the event status number generator (530), characterized by the event status number, the number of frame intervals and the number of event intervals together indicate a plurality of interval positions that understand the events in the audio signal frame.
[0011]
An encoding apparatus (510) according to claim 10, characterized in that the event state number generator (530) is adapted to generate an event state number, adding a positive integer value for each interval comprising a event.
[0012]
A coding apparatus (510) according to claim 10, characterized in that the event state number generator (530) is adapted to generate the event state number when determining a first event substation number for a first frame partition, by determining a second event substation number for a second frame partition, and by combining the first and second event state numbers to generate the event state number.
[0013]
13. A method for interval decoding positions that comprise events in an audio signal frame, comprising: analysis of a number of frame intervals, indicating the total number of intervals of the audio signal frame, a number of event intervals indicating the number of intervals comprising the events in the audio signal frame and an event status number, and generating an indication of a plurality of interval positions comprising the events in the audio signal frame using a number of interval intervals frame, the number of event intervals and the event status number.
[0014]
14. A method for encoding the interval positions that comprise events in an audio signal frame, comprising: receiving or determining a number of frame intervals indicating the total number of intervals of the audio signal frame, receiving or determining a number event intervals indicating the number of intervals that comprise the events of the audio signal frame, encode an event state number based on the event state number, the number of frame intervals and the number of event intervals, from such that an indication of a plurality of interval positions comprising events in the frame of the audio signal can be decoded using the number of frame intervals, the number of event intervals and the event status number.
[0015]
An encoded audio signal comprising an event state number, characterized by the positions of the intervals comprising the events, which can be decoded according to the method of claim 13.

类似技术:

公开号 | 公开日 | 专利标题

BR112013018362B1|2021-01-19|encoding and decoding event interval positions in an audio signal frame

JP6196249B2|2017-09-13|Apparatus and method for encoding an audio signal having multiple channels

BRPI1005299B1|2020-11-24|apparatus and method to perform the upmmix on a downmix audio signal

ES2596319T3|2017-01-05|Up Mixer, method and computer program to mix up a down mix audio signal

JP2011522472A|2011-07-28|Parametric stereo upmix device, parametric stereo decoder, parametric stereo downmix device, and parametric stereo encoder

CA2887228C|2019-09-24|Encoder, decoder and methods for backward compatible multi-resolution spatial-audio-object-coding

AU2015201672B2|2016-12-22|Apparatus for generating a decorrelated signal using transmitted phase information

RU2575393C2|2016-02-20|Encoding and decoding of slot positions with events in audio signal frame

BRPI1005360B1|2020-11-03|upmixer device

同族专利:

公开号 | 公开日

KR101657251B1|2016-09-13|

ZA201306173B|2014-04-30|

JP5818913B2|2015-11-18|

CN103620677B|2015-10-14|

US9502040B2|2016-11-22|

US20130304480A1|2013-11-14|

WO2012098098A1|2012-07-26|

EP2477188A1|2012-07-18|

BR112013018362A2|2016-10-04|

CA2824935A1|2012-07-26|

CA2824935C|2016-08-30|

AU2012208673B2|2015-05-14|

EP2666161A1|2013-11-27|

TWI485699B|2015-05-21|

AR084873A1|2013-07-10|

MY155887A|2015-12-15|

CN103620677A|2014-03-05|

JP2014508316A|2014-04-03|

MX2013008364A|2013-08-12|

RU2013138354A|2015-02-27|

KR20130133833A|2013-12-09|

TW201248619A|2012-12-01|

AU2012208673A1|2013-08-29|

SG191988A1|2013-08-30|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JP3307138B2|1995-02-27|2002-07-24|ソニー株式会社|Signal encoding method and apparatus, and signal decoding method and apparatus|

US6424938B1|1998-11-23|2002-07-23|Telefonaktiebolaget L M Ericsson|Complex signal activity detection for improved speech/noise classification of an audio signal|

JP4610087B2|1999-04-07|2011-01-12|ドルビー・ラボラトリーズ・ライセンシング・コーポレーション|Matrix improvement to lossless encoding / decoding|

AU2003281128A1|2002-07-16|2004-02-02|Koninklijke Philips Electronics N.V.|Audio coding|

SG108862A1|2002-07-24|2005-02-28|St Microelectronics Asia|Method and system for parametric characterization of transient audio signals|

US7536305B2|2002-09-04|2009-05-19|Microsoft Corporation|Mixed lossless audio compression|

TW594674B|2003-03-14|2004-06-21|Mediatek Inc|Encoder and a encoding method capable of detecting audio signal transient|

US7353169B1|2003-06-24|2008-04-01|Creative Technology Ltd.|Transient detection and modification in audio signals|

KR101217649B1|2003-10-30|2013-01-02|돌비 인터네셔널 에이비|audio signal encoding or decoding|

DE602005022641D1|2004-03-01|2010-09-09|Dolby Lab Licensing Corp|Multi-channel audio decoding|

KR100571574B1|2004-07-26|2006-04-17|한양대학교 산학협력단|Similar Speaker Recognition Method Using Nonlinear Analysis and Its System|

KR20070003593A|2005-06-30|2007-01-05|엘지전자 주식회사|Encoding and decoding method of multi-channel audio signal|

AU2006285538B2|2005-08-30|2011-03-24|Lg Electronics Inc.|Apparatus for encoding and decoding audio signal and method thereof|

WO2007029412A1|2005-09-01|2007-03-15|Matsushita Electric Industrial Co., Ltd.|Multi-channel acoustic signal processing device|

US7974713B2|2005-10-12|2011-07-05|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Temporal and spatial shaping of multi-channel audio signals|

RU2393646C1|2006-03-28|2010-06-27|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Improved method for signal generation in restoration of multichannel audio|

DE102006049154B4|2006-10-18|2009-07-09|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Coding of an information signal|

DE102007018032B4|2007-04-17|2010-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Generation of decorrelated signals|

CN101308655B|2007-05-16|2011-07-06|展讯通信（上海）有限公司|Audio coding and decoding method and layout design method of static discharge protective device and MOS component device|

US8725520B2|2007-09-07|2014-05-13|Qualcomm Incorporated|Power efficient batch-frame audio decoding apparatus, system and method|

TWI433137B|2009-09-10|2014-04-01|Dolby Int Ab|Improvement of an audio signal of an fm stereo radio receiver by using parametric stereo|WO2014020181A1|2012-08-03|2014-02-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Decoder and method for multi-instance spatial-audio-object-coding employing a parametric concept for multichannel downmix/upmix cases|

JP6046274B2|2013-02-14|2016-12-14|ドルビーラボラトリーズライセンシングコーポレイション|Method for controlling inter-channel coherence of an up-mixed audio signal|

WO2014126688A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Methods for audio signal transient detection and decorrelation control|

WO2014126684A1|2013-02-14|2014-08-21|Dolby Laboratories Licensing Corporation|Time-varying filters for generating decorrelation signals|

EP2830053A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Multi-channel audio decoder, multi-channel audio encoder, methods and computer program using a residual-signal-based adjustment of a contribution of a decorrelated signal|

EP2830051A3|2013-07-22|2015-03-04|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder, audio decoder, methods and computer program using jointly encoded residual signals|

JP6242489B2|2013-07-29|2017-12-06|ドルビーラボラトリーズライセンシングコーポレイション|System and method for mitigating temporal artifacts for transient signals in a decorrelator|

UA117258C2|2013-10-21|2018-07-10|Долбі Інтернешнл Аб|Decorrelator structure for parametric reconstruction of audio signals|

EP2866227A1|2013-10-22|2015-04-29|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Method for decoding and encoding a downmix matrix, method for presenting audio content, encoder and decoder for a downmix matrix, audio encoder and audio decoder|

EP2963645A1|2014-07-01|2016-01-06|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Calculator and method for determining phase correction data for an audio signal|

CN105654959B|2016-01-22|2020-03-06|韶关学院|Adaptive filtering coefficient updating method and device|

JP2020525853A|2017-07-03|2020-08-27|ドルビー・インターナショナル・アーベー|Reduced complexity of dense transient detection and coding|

CA3071208A1|2017-07-28|2019-01-31|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus for encoding or decoding an encoded multichannel signal using a filling signal generated by a broad band filter|

US10594869B2|2017-08-03|2020-03-17|Bose Corporation|Mitigating impact of double talk for residual echo suppressors|

US10200540B1|2017-08-03|2019-02-05|Bose Corporation|Efficient reutilization of acoustic echo canceler channels|

US10542153B2|2017-08-03|2020-01-21|Bose Corporation|Multi-channel residual echo suppression|

EP3692704A1|2017-10-03|2020-08-12|Bose Corporation|Spatial double-talk detector|

TW201928947A|2017-12-19|2019-07-16|瑞典商都比國際公司|Methods, apparatus and systems for unified speech and audio decoding and encoding decorrelation filter improvements|

US10964305B2|2019-05-20|2021-03-30|Bose Corporation|Mitigating impact of double talk for residual echo suppressors|

法律状态:
2018-12-18| B06F| Objections, documents and/or translations needed after an examination request according art. 34 industrial property law|

2019-09-17| B06U| Preliminary requirement: requests with searches performed by other patent offices: suspension of the patent application procedure|

2020-11-10| B09A| Decision: intention to grant|

2021-01-19| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 17/01/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201161433803P| true| 2011-01-18|2011-01-18|

US61/433,803|2011-01-18|

EP11172791A|EP2477188A1|2011-01-18|2011-07-06|Encoding and decoding of slot positions of events in an audio signal frame|

EP11172791.3|2011-07-06|

PCT/EP2012/050613|WO2012098098A1|2011-01-18|2012-01-17|Encoding and decoding of slot positions of events in an audio signal frame|

[返回顶部]